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Abstract 

Background: The small ruminant parasite Haemonchus contortus is the most widely used parasitic nematode in 
drug discovery, vaccine development and anthelmintic resistance research. Its remarkable propensity to develop 
resistance threatens the viability of the sheep industry in many regions of the world and provides a cautionary 
example of the effect of mass drug administration to control parasitic nematodes. Its phylogenetic position makes 
it particularly well placed for comparison with the free-living nematode Caenorhabditis elegans and the most 
economically important parasites of livestock and humans. 

Results: Here we report the detailed analysis of a draft genome assembly and extensive transcriptomic dataset for 
H. contortus. This represents the first genome to be published for a strongylid nematode and the most extensive 
transcriptomic dataset for any parasitic nematode reported to date. We show a general pattern of conservation of 
genome structure and gene content between H. contortus and C. elegans, but also a dramatic expansion of 
important parasite gene families. We identify genes involved in parasite-specific pathways such as blood feeding, 
neurological function, and drug metabolism. In particular, we describe complete gene repertoires for known drug 
target families, providing the most comprehensive understanding yet of the action of several important 
anthelmintics. Also, we identify a set of genes enriched in the parasitic stages of the lifecycle and the parasite gut 
that provide a rich source of vaccine and drug target candidates. 

Conclusions: The H. contortus genome and transcriptome provide an essential platform for postgenomic research 
in this and other important strongylid parasites. 
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Background 

Resistance to broad spectrum anthelmintic drugs is now 
widespread in parasites of domestic livestock [1,2] 
and there are increasing concerns about the sustain- 
ability of human parasite control programs using mass 
administration of the same drugs [3]. Consequently, 
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there is an urgent need to understand the genetic 
mechanisms underlying anthelmintic resistance and to 
discover new methods of chemical and non-chemical 
control. However, the genomic and genetic resources 
required to underpin research in parasitic nematodes 
are lacking [4-6]. The free-living nematode Caenorhabdi- 
tis elegans is a powerful model system, but it has clear 
limitations for the study of parasitic species [7] . Although 
the need to develop workable parasitic nematode model 
systems is widely recognized, most human helminth 
species are not amenable to experimental study. In con- 
trast, Haemonchus contortus, a gastrointestinal parasitic 
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nematode of small ruminants, has a successful track 
record in anthelmintic resistance [7,8], drug discovery 
[9,10] and vaccine [11-13] research. It is amongst the most 
experimentally tractable parasites for a number of reasons: 
adult females are relatively large and produce thousands of 
eggs per day, allowing the production of large amounts of 
biological and genetic material, the infective larvae (L3) 
can be viably cryopreserved, and in vivo studies, including 
genetic crosses, can be undertaken in the natural host [8] . 
Its phylogenetic position within the most closely related 
group of parasites to C. elegans facilitates comparative 
genomics and heterologous gene expression, allowing 
functional studies to be performed on H. contortus genes 
and regulatory elements [4,14]. As this parasite is a stron- 
gylid nematode, research on it is of particular relevance to 
the most economically significant parasites of grazing 
ruminants and to the human hookworms [15]. H. contor- 
tus itself causes major economic loss in small ruminants 
worldwide; it is highly pathogenic and unsurpassed in its 
ability to develop resistance to every anthelmintic used in 
its control. Notably, these include a number of core drugs 
used for mass drug administration programs in humans 
[3], Anthelmintic resistance in H. contortus and related 
strongylids now threatens the viability of the sheep indus- 
try throughout the world [2] . This represents both a warn- 
ing and a useful model for the consequences of the 
widespread intensive use of anthelmintics that are now 
being used to control human parasites in the developing 
world [3], 

H. contortus has a direct and rapid lifecycle (Figure SI in 
Additional file 1); adults reside and mate in the abomasum 
of the ruminant host, then females produce eggs to be 
excreted in the feces. The eggs embryonate, develop and 
hatch as first stage larvae (LI), develop and molt to 
become second stage larvae (L2), then molt again to 
become third stage larvae (L3) in the feces. The L3 migrate 
onto pasture to be ingested by the grazing ruminant host. 
The L3 shed the retained L2 cuticle (exsheath) in the 
rumen, travel to the abomasum and develop through to 
the fourth larval stage (L4) then adults in two to three 
weeks [16]. As voracious blood-feeders, H. contortus L4 
and adults can cause severe hemorrhagic gastritis, hypo- 
proteinemia, anemia and edema, with acute infections 
resulting in death of the animal host. One adult female 
can produce up to 10,000 eggs per day [17] and a single 
animal can harbor thousands of worms. This extremely 
high fecundity, under conducive host and environmental 
conditions, can give rise to population explosions and 
devastating disease outbreaks. Development occurs most 
rapidly in warm humid conditions, but the L4 stage can 
undergo arrested development within the host to survive 
adverse conditions such as prolonged drought or cold 
winters [18]. This feature of facultative arrested develop- 
ment, aided by movement of domestic livestock and 



climate change, has ensured the worldwide distribution 
and success of this parasite even though its evolutionary 
origins were in sub-Saharan Africa [19]. 

One striking feature of H. contortus, in common with 
related nematode species, is the extremely high level of 
genetic diversity that has been reported in both labora- 
tory and field populations; this is thought to be predo- 
minantly a function of its large effective population size 
[20-22]. Genetic variation may underlie both the para- 
site's remarkable ability to adapt to different climatic 
regions and host species [23] and its alarming propen- 
sity to develop drug resistance. This high level of genetic 
polymorphism has also provided a major challenge to 
genome assembly, necessitating the production of an 
inbred line from which to prepare DNA template. 

In this paper, we describe the assembly and annotation 
of the draft genome of MHco3(ISE).Nl, a genetically well 
characterized inbred H. contortus strain that is susceptible 
to all major anthelmintic drugs. The most extensive com- 
parative transcriptomic sequencing and analysis yet 
described for a parasitic nematode was undertaken on 
representative life-stages and the nematode gut to explore 
different aspects of nematode biology, evolution and para- 
sitism. This draft genome provides a platform for future 
research in anthelmintic resistance, drug discovery and 
vaccine development using H. contortus, currently the 
most important experimental model for the strongylid 
nematode group. 

Results and discussion 

Genome structure and content 

We assembled a draft sequence of the H. contortus gen- 
ome based on data from a mixture of sequencing tech- 
nologies (Material and methods; Additional results and 
methods in Additional file 1). Our final draft assembly 
consists of 67,687 contigs linked into 26,044 scaffolds of 
total length 370 Mb, including 23.8 Mb of inferred gaps 
between contigs, with an average and N50 scaffold 
length of 14,206 and 83,287 bp, respectively (Table SI 
in Additional file 1). This is a significantly larger gen- 
ome size than the approximately 60 Mb predicted with 
flow cytometry [24] but it is consistent with a prediction 
of approximately 200 to 300 Mb inferred from a pilot 
annotation of two large contiguous regions of the gen- 
ome [14], making this the largest nematode genome 
sequenced to date (Table SI in Additional file 1). The 
overall base composition of all sequence contigs was 
close to neutral at a mean 41.3% GC, slightly higher 
than that for other clade V nematodes. Approximately 
93% of conserved eukaryotic genes can be identified in 
the assembly, suggesting that our draft assembly repre- 
sents at least that fraction of the H. contortus genome 
[25], as many of these models are incomplete or split 
between contigs (Table S2 in Additional file 1). Despite 
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this, our current assembly is of similar quality to some 
other published draft nematode genome sequences 
[26-30] (Table SI in Additional file 1). 

We used extensive transcriptomic evidence from 
across the H. contortus lifecycle (Materials and methods; 
Table S3 in Additional file 1) to guide de novo gene 
model curation for protein-coding genes. We predict a 
similar number of protein-coding genes to the C. elegans 
genome (21,799 versus 20,532), but significantly lower 
gene density of 59 genes per Mb, with only 7% of the 
genome being protein coding, compared to 200 genes per 
Mb and a protein coding content of approximately 30% 
in the C. elegans genome (Figure 1). The average coding 
sequence length is similar in H. contortus and C. elegans 
(1,127 bp in H. contortus compared to 1,371 bp in 
C. elegans) but the average gene length is more than 



double in the parasite (6,564 bp compared to 3,010 bp in 
C. elegans). A closer look at a subset of 2,822 H. contortus 
and C. elegans one-to-one orthologs shows the discre- 
pancy is explained by an expansion in both the number 
and length of introns in H. contortus (average of 10 
introns per gene, average size 633 bp; relative to 6 introns 
per gene, average size 340 bp in C. elegans). An expan- 
sion in intronic sequence has also been reported in the 
closely related necromenic nematode Pristionchus pacifi- 
cus but to a less dramatic extent (average of nine introns 
per gene, average size 309 bp) [26]. Over 80% of 
H. contortus and C. elegans one-to-one orthologs present 
on the same scaffold occur on the same C. elegans chro- 
mosome (Table S4 in Additional file 1), suggesting wide- 
spread conservation of synteny between the two species 
(Figure 2). However, gene order is generally poorly 
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Figure 1 Evolution of genome content in clade V nematodes. A maximum-likelihood phylogeny based on concatenated alignment of 
single-copy genes. Values on edges represent the inferred numbers of births {+) and deaths (-) of gene families along that edge. Note that our 
approach cannot distinguish gene family losses from gains on the basal branches of this tree, so, for example, the value of 1,263 gene family 
gains on the basal branch of clade V will include gene families lost on the branch leading to the clade IV species. The first column of pie charts 
represents the gene family composition of each genome - the area of the circle is proportional to the predicted proteome size, and wedges 
represent the numbers of proteins predicted to be either singletons (that is, not members of any gene family), members of gene families 
common to all nine genomes, members of gene families present only in a single genome, and members of all other gene families. The second 
column of pie charts represents the genomic composition of the species with published genome sequences. All ingroup nodes had 100% 
bootstrap support. 



Laing et al. Genome Biology 2013, 14:R88 
http://genomebiology.eom/2013/14/8/R88 



Page 4 of 16 




Figure 2 One-to-one orthologs connecting H. contortus scaffolds and C. elegans chromosomes (a) All 130 H. contortus scaffolds with at 
least five one-to-one orthologs between the two species, with scaffolds ordered to maximize colinearity with the C, elegans genome, (b) The 
relative positions of one-to-one orthologs between the two largest H. contortus scaffolds and C. elegans chromosomes, demonstrating the lack of 
conservation of gene order within syntenic blocks. 



conserved, which is consistent with comparisons between 
C. elegans and related nematodes [27,31], but regions of 
conserved microsynteny are apparent and may prove use- 
ful in supporting orthology of divergent genes. 

Reflecting this relatively expanded genome, approxi- 
mately 29% of the draft genome assembly consists of 
repetitive sequence (compared to approximately 16% in 
C. elegans), including 2,434 copies of the characteristic 
HcRep element previously reported from a number of 
trichostrongylid species [14,32-34]. The repetitive 
sequence also includes representatives of many known 
repeat families in other nematodes, with approximately 
6% of the genome derived from LINEs, 2.5% from long 
terminal repeat retrotransposons and 5% from DNA 
transposons, including TTAA-specific, Mariner-like and 
MuDR-superfamily elements, together with evidence of 
elements related to Tel and Tc4 of C. elegans. Despite 
belonging to similar families, H. contortus repeats repre- 
sent independent expansions to those in other clade V 
nematodes, as repeat libraries from H. contortus show 



little similarity to genome sequence from other species, 
and vice versa. Our transcriptomic data suggest that 
active transposition is occurring, with evidence of 
expression for 4 out of 26 gene models annotated with 
transposase domains and 49 out of 482 reverse tran- 
scriptase domain-containing proteins. Expressed pro- 
teins appear to belong to a range of DNA transposon 
types, and to Gypsy-related and LINE retro-elements. 

In C. elegans, around 17% of genes are in operons [35] - 
tightly linked clusters of two to eight genes, which are 
co-transcribed from the same promoter. The resulting 
polycistronic pre-mRNAs are resolved by fraws-splicing 
with spliced leader (SL) SL1 and SL2 sequences. Most fre- 
quently, SL1 is fraws-spliced to the first gene in an operon 
and downstream genes are SL2 trans- spliced. The struc- 
ture (gene complement, order and orientation) of around 
23% of C. elegans operons is conserved in the H. contortus 
genome. The structure of a further 10% of C. elegans oper- 
ons appear to be partially conserved, where at least two 
orthologs are present on the same scaffold in the expected 
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order and orientation, but one or more genes are in a dif- 
ferent order, inverted or absent. Functional constraints are 
thought to conserve the intergenic distance in C. elegans 
operons to approximately 100 bp but genes in H. contortus 
operons are further apart: the average intergenic distance 
of genes with a conserved operon structure is 992 bp 
(median 621 bp, largest 8,329 bp), and the operon encod- 
ing ion channel subunits Hco-deg-2H and Hco-deg-3H 
has an intergenic distance of 2,342 bp [36]. 

Overall, SL1 tra«.s-splicing was detected in 6,306 
H. contortus genes and SL2 trans-splicing was detected 
in 578 genes. Of these, 318 trans-spliced genes were in 
the putative conserved operons identified above (Addi- 
tional methods in Additional file 1). All 126 first genes 
in operons were fraws-spliced to SL1 (SL2 fraws-splicing 
was detected in five putative first genes, but examination 
of their loci suggests they are downstream genes in new 
operons in this species); 119 downstream genes were 
trans-spliceA to SL1 and 73 were fraws-spliced to SL2. If 
SL2 fra«5-splicing is the definitive criterion in identify- 
ing operons, the relatively low level of SL2 trans -splicing 
to downstream genes suggests that either operon func- 
tion is less frequently conserved than operon structure 
in H. contortus or that undetected, divergent SL2 
sequences are present. However, the relatively high fre- 
quency of SL1 ira«s-splicing in genes that are also SL2 
frans-spliced (approximately 77%; 56 of 73 in conserved 
operons, 445 of 578 in all genes identified) suggests SL1 
ira«s-splicing of downstream genes may also be rela- 
tively common in this species. 

We employed two complementary approaches to glo- 
bal comparison of the H. contortus gene set, using the 
Inparanoid algorithm to look in detail at orthologs with 
C. elegans and P. pacificus, and OrthoMCL for a wider 
view of gene family evolution with other clade V nema- 
todes. Of 5,937 orthology groups between C. elegans 
and H. contortus, 5,012 are one-to-one orthologs, while 
an additional 899 orthologs could be identified in 
H. contortus and P. pacificus but not C. elegans, suggest- 
ing they have been lost in the C. elegans lineage (Table S5 
in Additional file 1). A number of orthology groups are 
significantly expanded in H. contortus, including a family 
of 180 Haemonchus paralogs to a single C. elegans gene 
that lacks any functional annotation (Table S6 in Addi- 
tional file 1). Other expanded groups include genes with 
likely roles in parasitism, such as cysteine-rich secreted 
proteins, together with a set of helicase domains that 
include some with predicted signal peptides. Global analy- 
sis of the evolution of entire gene families (clusters of simi- 
lar genes) across the clade V nematodes confirms this 
pattern of significant diversity within the clade (Figure 1), 
and allowed us to identify H. contortus genes lacking clear 
orthologs in other clade IV or V nematodes (Figure S2 
and Table S8 in Additional file 1). This shows that the 



Haemonchus-specific proteome is enriched in genes 
encoding polypeptides involved in proteolysis, neurotrans- 
mission and carbohydrate metabolism, and in secreted 
proteins and those expressed in the cuticle. While some of 
these genes are explored in detail in the more focused ana- 
lyses below, others may represent novel candidate genes 
involved in the host-parasite interface. 

Gene expression in parasite life-stages and gut 

As H, contortus progresses through its lifecycle, it must 
adapt to different environments with differing food 
sources and energy requirements (Figure SI in Addi- 
tional file 1) and this is reflected in differential gene 
expression. RNA-seq was used to analyze gene expres- 
sion in six parasite life-stages and the adult female gut. 
Samples were made in triplicate from independent 
batches of parasite material for every stage, allowing sta- 
tistically robust comparison of relative gene expression 
between the stages (Materials and methods; Table S3 in 
Additional file 1). We found significant expression for 
17,483 genes in total from the 6 life-stages studied; with 
between 13,962 (in L3) and 15,569 (in adult males) 
genes expressed in each stage. A total of 11,295 genes 
were significantly up- or down-regulated through the 
lifecycle (Figure 3 and Materials and methods) and we 
used annotation with Gene Ontology (GO) terms to 
investigate their broad functions. Metabolic enzyme 
expression throughout the parasite lifecycle was ana- 
lyzed in more detail and will be discussed separately. 

Genes up-regulated in the development from egg to 
the LI stage include those associated with muscle devel- 
opment and motor activity, while up-regulated in the 
egg are genes associated with oxidoreductase activity, 
apoptosis, body morphogenesis and development, larval 
and embryo development, as well as DNA replication 
and chromosome organization. This pattern is consis- 
tent with the progression from a developing embryo- 
nated egg to a motile and actively feeding larval stage. 

In the transition to the L3 stage, a decrease in the 
expression of genes associated with the myosin complex 
and motor activity and various metabolic processes are 
consistent with the nematode entering a quiescent state. 
Among the up-regulated genes there is an association 
with oxygen transport and heme binding. Oxidoreductase 
enzymes are over-represented and may reflect the 
increased need to detoxify a build up of endogenous 
waste, consistent with previous studies showing higher 
cytochrome P450 (CYP) activity in the H. contortus L3 
than in LI or adult stages [37]. CYPs have also been 
shown to be up-regulated in response to reduced food 
intake [38]. This is consistent with the significant 
increase in gluconeogenesis from the LI to L3 - which is 
also increased in the C. elegans dauer [39] - and the up- 
regulation of acetyl-CoA metabolic process, likely to 
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Figure 3 Differential gene expression across the H. contortus lifecycle. (a) Numbers of genes significantly up-regulated (red) and down- 
regulated (blue) between life-stages, (b) The expression profiles of genes determined to be differentially expressed between any pair of stages 
are shown in clusters ordered by the stage at which they are most highly expressed. The innermost circle shows normalized expression levels 
during the egg stage with further concentric circles showing expression in the LI, L3, L4 stages, followed by male worms, female worms and 
then female gut samples. Mean expression values were taken over three replicates and are shown on a log2 scale. 



reflect metabolism of fat stores in the non-feeding L3. 
Genes associated with binding of cobalamin (vitamin 
B12) are also up-regulated. Cobalamin has been shown 
to be strongly concentrated and stored in the infective L3 
of other gastrointestinal nematodes [40] and a ready sup- 
ply may be required for rapid larval development after 
ingestion by the host. 

The L4 is the first blood-feeding stage of H. contortus. 
The transition from the quiescent L3 stage to the motile 
L4 stage is reflected in significant up-regulation of many 
genes, including genes associated with motor activity, 
the myosin complex and locomotion as well as various 
metabolic processes. The binding of oxygen, lipids and 
sugars, possibly associated with active feeding, is also 
up-regulated, as are changes in the expression of genes 
linked to response to oxidative stress that may reflect 
the reactivation of the parasite from its dormant stage. 
A significant increase in the expression of genes asso- 
ciated with collagen and cuticle development and body 
morphogenesis, consistent with parasite growth, is also 
observed. Interestingly, heme-binding genes were both 
up- and down-regulated, perhaps reflecting an increase 
in heme load from blood feeding and a decrease in CYP 
activity. 

Transition from the L4 to the adult stages is charac- 
terized by several changes. In the transition to the 



female stage, 1,658 genes are up-regulated and correlate 
with gender-specific development as well as embryogen- 
esis (adult females contain oocytes and eggs at various 
developmental stages), such as genitalia development, 
embryo development, oogenesis, ovulation, germ-line 
cell cycle switching, meiosis regulation and regulation of 
vulval development. A significant increase in DNA repli- 
cation processes is also apparent. Between the L4 and 
adult male stage, lower expression in the adult male of 
genes linked with body morphogenesis, molting, col- 
lagen and cuticle development, oxidoreductase activity, 
heme binding and response to oxidative stress were 
observed among the various alterations in expression. 
Increased expression of genes annotated with the GO 
term 'structure molecule activity' is due to up-regulation 
of a number of major sperm protein genes. These sex- 
specific gene expression patterns were confirmed by 
comparison between adult female and male parasites, 
and it is clear that many genes are highly enriched in 
male worms, being expressed only at low levels in other 
stages. The large set of genes highly expressed in both 
eggs and adult females were again apparent. 

Finally, we investigated gene expression in the 
H. contortus intestine, the major organ of digestion and 
detoxification in the nematode, by comparing the female 
gut sample with the whole female worm. Consistent 
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with data from H. contortus gut EST libraries [41], 
increased expression of genes with protein kinase, 
cysteine-type peptidase and cysteine-type peptidase 
inhibitor activities predominated. Genes associated 
with sugar and cobalamin binding were significantly up- 
regulated, as were genes associated with transport of 
cations, anions and oligopeptides. Oxidoreductase activ- 
ity was also increased, consistent with the expression 
pattern of detoxification genes in C. elegans [42]. 

Metabolic pathways and chokepoint analysis 

Comparisons between H. contortus and the free-living 
nematodes revealed 22 enzyme classifications (ECs) that 
were restricted to the parasite (Table S9 in Additional 
file 1). While more detailed analysis is required, metabo- 
lism of amino acids and carbohydrates clearly differ 
between these two groups. For example, lysine 6-amino- 
transferase (EC 2.6.1.36) catalyzes lysine to glutamate, 
which can be further converted to a-ketoglutarate, an 
intermediate of the tricarboxylic acid cycle. Lysine 
6-aminotransferase was previously considered restricted 
to prokaryotes; thus, its activity in H. contortus needs to 
be confirmed. 

A summary of up- and down-regulated metabolic 
enzymes across all life-stages is shown in Table S10 in 
Additional file 1. The transition through eggs, LI, L3 
and L4 showed a striking pattern: from LI to L3, most 
enzyme classifications were down-regulated, including 
those involved in carbohydrate, lipid and energy meta- 
bolism, but many of these were up-regulated again in 
the transition to L4. This is consistent with the L3 being 
a stage in which development is arrested, analogous to 
the dauer larva in C. elegans. Further support for this 
comparison is the up-regulation of two enzymes that 
independently convert isocitrate to 2-oxo-glutarate, 
while most other parts of the tricarboxylic acid cycle are 
down-regulated. Furthermore, 2-oxo-glutarate is an 
entry metabolite into the ascorbate and aldarate meta- 
bolic pathway, which is implicated in increased lifespan 
in Drosophila [43]. The L4-to-male transition shows a 
decrease in lipid metabolism coupled with an increase 
in amino acid metabolism. 

Metabolic chokepoints - reactions that uniquely con- 
sume or produce a metabolite - are enzymes that seem 
likely to be essential to the parasite and so may be poten- 
tial targets for future drug development. Analysis of the 
H. contortus network revealed 362 chokepoint reactions, 
five of which passed additional criteria of lacking isoen- 
zymes and being divergent from known human enzymes 
[44] (Table Sll in Additional file 1). Comparison of these 
against the Therapeutic Target Database revealed that two 
are known potential targets for anthelmintic drugs: treha- 
loses-phosphatase (EC 3.1.3.12) is a member of the 
sucrose metabolism pathway, and is currently being 



researched as a potential drug target in the filarial nema- 
tode B. malayi [45], while dTDP-4-dehydrorhamnose 
3,5-epimerase (EC 5.1.3.13) is currently of interest in 
Mycobacterium tuberculosis [46]. This validation of the 
approach justifies further analysis of the other three cho- 
kepoint enzymes identified. 

Neuromuscular drug targets 

In nematodes, the pentameric ligand-gated ion-channel 
(pLGIC) family is particularly numerous, with 64 members 
identified in H. contortus via homology with Caenorhabdi- 
tis spp., P. pacificus and RNA-seq data (Figure 4). They 
are of great importance in parasitic nematodes because 
they are targets of the majority of the currently available 
anthelmintic drugs (as summarized in Table S12 in Addi- 
tional file 1; the |3 -tubulin targets of the benzimidazole 
group have been described previously [47]). 

The pLGICs regulate the flow of anions, typically chlor- 
ide ions, or cations, including sodium and calcium, in 
response to an extracellular signal in the form of an acti- 
vating ligand or change in pH. They are fundamental to 
synaptic transmission; interference with their normal func- 
tion results in paralysis and death. Drugs that activate the 
anionic channels, such as the macrocyclic lactone ivermec- 
tin (IVM), typically inhibit neuronal transmission and 
muscle contraction. Those that activate the cationic chan- 
nels, such as levamisole (LEV) and monepantal, stimulate 
neuronal transmission and typically induce muscle con- 
traction. Here we present the most complete picture of 
these channels to date and show that, as expected, this 
parasite is very similar to C. elegans. This supports the use 
of the free-living worm as a functional model for the para- 
site nervous system. There are, however, some important 
differences, most significantly in glutamate signaling, 
which is sensitive to the macrocyclic lactones, and acetyl- 
choline signaling, which is sensitive to LEV, as well as 
characteristic loss of some receptor genes associated with 
biogenic amine signaling. 

The macrocyclic lactones, which include IVM and 
moxidectin, act at several different glutamate-gated 
chloride channels. Some of these are found in both 
C. elegans and H. contortus, but it is noteworthy that 
two H. contortus subunit genes, glc-5 and glc-6, encode 
glutamate-sensitive channels that are absent from 
C. elegans. It is likely these were lost from the rhabtidid 
lineage as homologous sequences can be detected in the 
genome of the close relative P. pacificus. Both of these 
subunits are targets for the macrocyclic lactones [48,49] 
and changes in either their sequence or expression 
have been associated with drug resistance in veterinary 
parasites [50]. The archetypal target of IVM, Cel-glc-1, 
appears to be a duplication of avr-15, specific to 
C. elegans. These differences may explain the difficulties 
encountered in understanding IVM's mode of action 
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Figure 4 Maximum-likelihood phylogenies of H. contortus and C. elegans ligand-gated ion channel families. Figure shows unrooted trees 
for (a) cation channel and (b) anion channel genes from the two nematode species, with different colors representing each major clade of 
channels. Within clades, C. elegans genes are colored a lighter shade than H. contortus genes. Both trees show a general pattern of conserved 
one-to-one orthology, and colored bars indicate where this pattern is broken - by a duplication in H. contortus (with four copies of unc-29), and 
by loss of some genes - there are four H. contortus genes (acr-26, acr-27, glc-5, glc-6) without orthologs in C. elegans, and seven in C. elegans 
missing from H. contortus (acr-13/lev-8, acr-18, acr-23, glc-l, lev-8, lgc-48, lgc-54). All of these genes are shown in large bold type. A truncated 
ortholog of C. elegans acr-9 is present in H. contortus but is omitted from this phylogeny. Values near internal nodes are Shimodaira-Hasegawa 
test support values - with support values of 1.0 on all other nodes omitted for clarity. 
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and resistance in parasitic nematodes by studying 
the model organism [51]. Most of the other anionic 
channel subunits in H. contortus have direct orthologs 
in C. elegans (Figure 4a). The only notable absences are 
lgc-48 and lgc-54. lgc-48 encodes an as yet uncharacter- 
ized member of the acc-1 acetylcholine gated chloride 
channels [52] that appears to have been lost from 
H. contortus, and lgc-54 encodes an uncharacterized 
member of the C. elegans biogenic amine-gated chloride 
channel family that may not be present in H. contortus 
[53]. 

The acetylcholine gated receptor cation channels are of 
particular interest as they are the targets of several anthel- 
mintic drugs already in use (LEV, monepantel (MPTL) 
and derquantel) and are considered to be one of the most 
promising gene families for the identification of new drug 
targets. Comparison of the H. contortus gene family with 
those found in C. elegans is shown in Figure 4b. There is a 
one-to-one correspondence in H. contortus for subunits of 
the acr-16 clade [54], named for the homomeric nicotine- 
sensitive channel in C. elegans, and acr-16 expression is 
much higher in adult males than in other life-stages, as 
previously described for C. elegans [55]. The anthelmintic 
LEV targets receptors composed of a subunits of the unc- 
38 clade and non-a subunits from the unc-29 clade. In 
C. elegans three different alpha and two different non- 
alpha subunits combine to form a channel that responds 
to acetylcholine, strongly to LEV and only weakly to nico- 
tine [56]. The a-type acr-13 {lev-8) gene, required in 
C. elegans for the LEV receptor, appears to have been lost 
in H. contortus as an ortholog of this gene is detectable in 
P. pacificus. The non-alpha gene lev-1 is present, but the 
signal peptide appears to have been lost [57]. A LEV 
receptor in H. contortus can be reconstituted without 
either of these, requiring only UNC-38, UNC-63 and 
UNC-29 as in C. elegans [57] and replacing ACR-13 with 
ACR-8. Most strikingly, the non-a subunit gene unc-29 
in C. elegans corresponds to four paralogous copies in 
H. contortus [57]. A LEV-sensitive channel has been 
reconstituted containing the Hco-UNC-29.1 subunit [58]. 
The degree to which the other copies may have diverged 
in function is currently under investigation. The deg-3 
clade [54] encodes subunits that form channels involved 
in chemotaxis, and is of particular interest as channels 
encoded by des-2/deg-3 and acr-23 in C. elegans are 
targeted by the relatively new anthelmintic MPTL, with 
acr-23 as the principal target of MPTL action in vivo [9]. 
The single subunit gene mptl-1 in H. contortus corre- 
sponds to the acr-231 acr-20 pair in C. elegans. Splice site 
mutations in the mptl-1 gene are associated with resis- 
tance to MPTL [36], which would suggest that acr-23 is 
functionally equivalent to mptl-1. 

H. contortus possesses two acetylcholine gated recep- 
tor genes that are not present in C. elegans, acr-26 and 



acr-27, and these two genes seem to form a distinct 
clade, which suggests that they are likely to form recep- 
tors with a distinct pharmacology. Orthologs of acr-26 
are found in many species of parasitic nematode and 
therefore they might make interesting targets for the 
development of novel cholinergic anthelmintics. 

In summary, the majority of the pLGIC subunit genes 
in C. elegans have direct orthologs in H. contortus, 
although there is more variation in the repertoire of 
orphan family pLGICs (Additional results in Additional 
file 1). This supports the use of C. elegans as a model to 
study the neuromusculature of H. contortus in drug 
screens, but there are important differences in specific 
anthelminthic targets: for each of the anthelmintics 
IVM, LEV and MPTL, the drug targets in H. contortus 
are functionally different to the C. elegans model. 

Drug metabolism and efflux 

Parasitic nematodes are armed with a large repertoire of 
inducible metabolizing enzymes and transporters to protect 
against environmental toxins. The nematode detoxification 
pathway can be divided into three main phases: modifica- 
tion, conjugation and excretion [59] involving the cyto- 
chrome P450s (CYPs) and the short-chain dehydrogenase/ 
reductases (SDRs) in phase 1, the UDP-glucoronosyl trans- 
ferases (UGTs) and the glutathione S-transferases (GSTs) 
in phase 2 and the ATP-binding cassette (ABC) transpor- 
ters in phase 3. Little is known about the impact of the 
parasite detoxification system on anthelminthic efficacy or 
resistance [60], but a better understanding of these path- 
ways will allow their role in anthelmintic resistance to be 
assessed and should also be informative in the design of 
new drugs and synergistic agents [61]. We have annotated 
a large number of modification and conjugation genes in 
the current assembly, identifying a total of 42 CYPs, 44 
short-chain dehydrogenase/reductases, 34 UDP-glucorono- 
syl transferases and 28 glutathione S-transferases (Figure 
S5 in Additional file 1), but here we focus on the excretion 
transporters implicated in drug resistance. 

Members of the ABC transporter family hydrolyze ATP 
and couple the energy released with the active transport 
of a wide range of compounds, including small organic 
molecules, lipids, proteins and metal ions. They are an 
essential component of many biological processes and 
are fundamental to the barrier between a nematode and 
its environment. These transporters consist of a basic 
structure of six transmembrane domains with an asso- 
ciated intracellular ATP binding motif. The functioning 
transporter complex requires two of these protein halves, 
but some members of this family are fusion proteins with 
12 transmembrane domains and two ATP binding 
motifs. We find numerous differences in ABC transporter 
genes between C. elegans and H. contortus (Additional 
results and Figure S4 in Additional file 1), with a reduced 
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complement of haf-transporters, including, for example, 
the loss of haf-6, which is essential for efficient RNA 
interference in C. elegans [62], a significant expansion 
of ced-7, which has an unknown function in amphid and 
phasmid sensory cells, and an extensive change in the 
repertoire of multidrug resistance protein genes. The 
P-glycoprotein (pgp) transporters are of particular inter- 
est as they have been implicated in resistance of H. con- 
tortus to IVM and other anthelmintics [49,63], and this 
family has undergone significant change compared to the 
free living C. elegans. In total, 10 P-glycoprotein genes 
have been identified in the H. contortus genome and 
knowledge of this full complement will now allow a more 
systematic analysis of the role of P-glycoproteins in resis- 
tance to IVM and other anthelmintics. A cluster of four 
C. elegans genes, pgp-5, 6, 7 and 8, are not found in 
H. contortus. Cel-pgp-3 and 4 as well as Cel-pgp-12, 13 
and 14 represent gene duplication events corresponding 
to single genes in the parasite. Cel-pgp-9 corresponds to 
two paralogous copies in H. contortus, and in addition, 
two genes not present in C. elegans, related to pgp-3 and 
pgp-11, have been retained in H. contortus. These are 
named pgp-16 and pgp-17, respectively. Changes in the 
sequence or expression of pgp-1, pgp-2 and pgp-9 have 
been reported for IVM-resistant versus-susceptible iso- 
lates of H. contortus and in pgp-9 for resistant Telodorsa- 
gia circumcincta [50,64,65]. 

Protease vaccine candidates 

H. contortus is a voracious blood-feeder, with even modest 
infections of 1,000 worms generating losses of up to 50 ml 
of blood per day [66]. Cysteine, aspartic and metallo- 
proteases as well as aminopeptidases have been implicated 
in important aspects of parasite function, including hemo- 
globin digestion and anticoagulant activity, and these 
enzymes are also important vaccine candidates. Vaccina- 
tion with gut extracts enriched for these activities can con- 
fer up to 75% reduction in worm burden and 90% 
reduction in egg output [67]. Protection is thought to 
result from the ingestion of host antibodies by the parasite 
during blood-feeding, which bind to gut antigens and dis- 
rupt function. Female parasites appear to be more affected 
than males, increasing the relative impact on egg output. 
Development of a commercial vaccine, however, requires 
identification and expression of specific proteases that, 
either singly or combined, induce protective immunity. 
Comparative genomics and transcriptome data can aid 
target selection by identifying potential functionally impor- 
tant proteases and those enriched in the gut of blood- 
feeding L4 and adult stages, suggesting a role in blood- 
feeding as well as accessibility to host antibodies. 

Cathepsin B protease (cbl) genes are part of an ordered 
hemoglobin degradation pathway, functioning after aspar- 
tic proteases (APRs) and upstream of metalloproteases 



(MEPs) and aminopeptidases [68,69]. Cathepsin B diver- 
sity may therefore be key in generating an array of sub- 
strates from ingested nutrients for efficient cleavage 
by downstream proteases and may be involved in the 
high blood digestion capacity of H. contortus. Indeed, 
H. contortus has a higher copy number (63 genes) of 
cathepsin B protease genes than related free-living nema- 
todes, representing 80% of all cathepsin cysteine protease 
genes in the genome (Figure 5a). This large expansion of 
the cathepsin B family has resulted in H. contortus show- 
ing a greater diversity of cbl genes than known in 
any other parasitic nematode. Uniquely, members of each 
cbl type are arranged in tandem in the H. contortus gen- 
ome (Figure 5b) and have the same genomic structure, 
suggesting that diversity has arisen through recent gene 
duplication. Reflecting this, most CBLs encoded in the 
H. contortus genome form large monophyletic groups dis- 
tinct from C. elegans [70], hookworm [71] or other stron- 
gylid CBLs, suggesting that duplication and divergence has 
occurred separately following speciation. It is possible that 
gene amplification has occurred as a mechanism of 
increasing overall CBL expression associated with the 
need for H. contortus to digest the huge quantities of host 
blood it takes in during feeding. The cbl genes show 
increased expression in L4 and adult stages and many are 
significantly enriched in gut tissue, identifying these as 
potentially important control targets. These include both 
novel and previously identified cbl genes (AC-4, gcp-7 and 
hmcp-2) [72-74], while other cbl transcripts are signifi- 
cantly up-regulated in the L4 stage and in adult male 
worms, but not in the gut, suggesting a role in develop- 
ment and reproduction rather than feeding. The CBLs 
identified here contain a putative amino-terminal signal 
peptide and several have been identified in adult worm 
excretory-secretory products [75]. It has also been pro- 
posed that sequence variation of Hc-CBLs may confer 
antigenic diversity [76], and presentation to the host 
immune system through secretion may therefore drive the 
maintenance of the diversity of Hc-cbl genes. Three- 
dimensional modeling of the CBL repertoire and epitope 
mapping will clarify this issue. 

Other cathepsin cysteine protease genes are not 
expanded - for example, a cpr-6-like single copy gene is 
highly conserved in a number of parasitic and free-living 
nematodes, suggesting a house-keeping role, and there 
is no expansion found for cathepsin L, F or Z genes in 
H. contortus. Furthermore, there is less expansion of 
genes encoding other vaccine candidates - H-gal-GP 
composed predominantly of pepsinogen-like aspartic 
and metallo proteases [77], and Hll aminopeptidases 
[78]. These are integral gut membrane proteins, consid- 
ered hidden from the immune system during natural 
infection [67], and lower diversity should be advanta- 
geous in vaccination studies. 
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Figure 5 (a) Maximum-likelihood phylogeny of cathepsin B (cbl) genes in H. contortus and related nematodes. The H, contortus genome 
shows large, independent expansions of all three main classes of cbl genes: AC (anticoagulant)-like, GCP-7-like and Hmcp-like. (b) Many of the 
genes in two of these expansions occur as tandem arrays of duplicates on three scaffolds of the H. contortus genome. Two of these scaffolds are 
known to be linked, as eight of the genes occur on a previously completed bacterial artificial chromosome (BAC) sequence. 



Significant expansion is, however, seen in the cathe- 
psin aspartic protease family (Figure S6 in Additional 
file 1). Importantly, the genome data identify a novel, 
single-copy apr gene with 84% amino acid identity to 



APR-1 of the dog hookworm Ancylostoma caninum and 
that is significantly enriched in gut tissue. Vaccination 
with recombinant Ac- APR-1 significantly reduced fecal 
egg counts, worm burdens and anemia [79], warranting 
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investigation of the protective potential of the Hae- 
monchus APR-l-related protease. Other novel Hc-apr 
genes group phylogenetically with C. elegans asp genes 
(Figure S6 in Additional file 1) and are increased in the 
environmental LI stage, indicating a developmental role, 
consistent with C. elegans data [80,81] and ruling these 
out as potential vaccine targets. 

Conclusions 

H. contortus is the one of the most intensively researched 
parasitic nematode species and is the most established 
parasite species used in drug discovery, drug mode-of- 
action and drug resistance research. It is the first strongylid 
nematode to have its genome sequenced and the analysis 
presented here provides a first genomic insight into the 
biology of a gastro-intestinal strongylid and blood feeding 
parasitic nematode. It confirms the close relationship 
between H. contortus and C. elegans and provides a plat- 
form from which to apply the C. elegans biological knowl- 
edge to investigate the biology of this parasite for both 
basic and applied research. It also highlights specific differ- 
ences between the species that will be important to inform 
researchers of those aspects of C. elegans biology that 
cannot be extrapolated. The genome sequence, gene mod- 
els, transcriptome datasets and bioinformatic analyses pro- 
vide a wealth of information on potential new drug and 
vaccine targets. They will also be an invaluable resource for 
the application of post-genomic technologies to this and 
other related strongylid parasites. 

Materials and methods 

Parasite material production and quality assurance 

The MHco3(ISE) line is a H. contortus strain, susceptible 
to all the major anthelmintic classes, that has been pas- 
saged in the laboratory by serial infection for many years. 
It has been extensively phenotypically and genetically 
characterized previously [82]. The inbred MHco3(ISE).Nl 
strain was derived by a genetically validated single pair 
mating of an adult male and an adult female MHco3(ISE) 
worm using a method of direct transplantation into the 
abomasum of a recipient parasite-free sheep (described in 
[83]). The inbred MHco3(ISE).Nl strain was subsequently 
passaged by oral experimental infection of parasite-free 
sheep and parasite material was harvested using standard 
methods (see Additional methods in Additional file 1 for 
details). All parasite material used in this study was 
derived from the inbred H. contortus MHco3(ISE).Nl 
strain except for the female gut transcriptomic data, which 
were generated from the MHco3(ISE)strain. The MHco3 
(ISE) and MHco3(ISE).Nl strains were routinely moni- 
tored for genetic integrity at each passage, as was all para- 
site material derived from the strains, using a standard 
panel of microsatellite markers and a well established 
'genetic fingerprinting' methodology [8,82]. Both the 



MHco3(ISE) and the MHco3(ISE).Nl are viably cryopre- 
served and available on request. 

Genome sequencing and assembly 

We assembled a draft sequence of the H. contortus gen- 
ome based on data from a mixture of sequencing technol- 
ogies, including 21 x coverage of paired-end and shotgun 
reads from a Roche454SLX platform and approximately 
163 x coverage of long- and short-insert paired-end 
libraries from an Illumina HiSeq (coverage based on a gen- 
ome size of 370 Mb). Genomic and transcriptomic 
sequence data were generated using largely standard mole- 
cular biology methods (Additional methods in Additional 
file 1), except that whole-genome amplified material was 
used to generate sufficient material for a large-insert 
Illumina library from a single male worm, using a modified 
protocol. The genomic reads from each technology were 
initially assembled independently using assembly 
algorithms most suited to the typical coverage and read 
length of each, and scaffolds from each were then merged 
to form an initial set of scaffolds that were then improved 
by automated gap filling, followed by breaking and re- 
scaffolding using the paired-end data from both sequen- 
cing platforms. 

Protein coding gene prediction and functional annotation 

After quality control and end-trimming, the transcrip- 
tome reads were mapped against the reference genome 
using TopHat software, v 2.0.6 (-mate-std-dev 50 -a 6 -i 
10 -I 20000 -microexon-search -min-segment-intron 
10 -max-segment-intron 50000) [84]. A reference data- 
set of 399 H. contortus protein encoding genes was 
manually curated from predictions of highly conserved 
genes using CEGMA (version 2.4) [24] and RNA-seq 
mapping. Of those, 347 were used to train Augustus 
[85] and 52 were used to independently evaluate the 
accuracy of the predictions. Final gene prediction (v2.0) 
was performed by Augustus using H. contortus specific 
parameters and RNA-seq, EST and polyA mappings as 
evidence hints generated by TopHat2 and PASA [86], 
respectively. Gene prediction accuracy was computed at 
the level of nucleotides, exons and complete genes on 
52 manual curated gene models as described previously 
[87] and shown in Table S7 in Additional file 1). Func- 
tional annotation information was obtained from the 
interpro databases using interproscan v4.5 [88], GO [89] 
terms were annotated via interpro2GO and from the 
curated C. elegans annotation in Wormbase (release 235) 
[90] by assigning all GO terms shared by all C. elegans 
genes in a gene family to the H. contortus members of 
that family. Further functional insight was obtained by 
BLAST searches for similar genes in the GenBank nr 
database, and putative signal peptides were identified by 
SignalP [91]. To investigate H. contortus metabolism, a 
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total of 828 ECs, covering 2,853 proteins, were assigned 
using KAAS [92], DETECT [93] and EFICAz [94]. Of 
these, 563 ECs covering 1,246 proteins were assigned to a 
metabolic pathway; the others are non-metabolic 
enzymes. Similar annotation efforts were carried out on 
Caenorhabditis species and P. pacificus. 

Gene expression and Gene Ontology analysis 

The numbers of RNA-seq reads per gene model were 
counted using custom-made scripts making use of BED- 
tools and a gff file of the genome annotation, and based 
on the TopHat mapping described above. Analysis of gene 
expression was performed using the DESeq (v 1.8.1) pack- 
age for Bioconductor [95] . Read coverage was normalized 
to estimate the effective library sizes for each library and 
negative binomial tests performed between pairs of sample 
triplicates, using dispersion estimates from the default 
approach, to obtain P-values for differential expression 
of each gene adjusted for false discovery rate using the 
Benjamini-Hochberg procedure for multi-testing [96]. 
Only genes with adjusted P-values <le~ were retained. 
GO terms enriched in the set of differentially expressed 
genes in each comparison were identified using the 
'weightOl' algorithm of the TopGO (v 2.8.0) package for 
Bioconductor [97]. Only GO terms with P < 0.01 were 
considered for more detailed analysis. Expression data 
were drawn using Circos-0.62. All differentially expressed 
genes were clustered using MBCluster.seq with 50 clus- 
ters, and clusters then ordered based on which stage had 
the highest mean expression in that cluster. 

Data access 

All sequences described in this paper have been sub- 
mitted to the GenBank database, project ID PRJEB506. 
Sequence data are available at [98] and annotation has 
been submitted to GenBank and Wormbase. ENA acces- 
sion numbers for all genomic and RNA-seq reads are 
listed in Tables S3, S13 and S14 in Additional file 1. 
The H. contortus genome assembly and functional anno- 
tation are available at [99]. 
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